A Search Engine for Handwritten Documents
نویسندگان
چکیده
The design and functionality of a versatile search engine on handwritten documents is described. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well local features that describe shapes of characters and words. Image indexing is done automatically using page analysis, page segmentation, line separation, word segmentation and recognition of characters and words. Several types of searches are permitted: (i) Word / Phrase Spotting; (ii) Text to Image Search; (iii) Plain Text Search; (iv) Word Recognition from Lexicon. The words in the document are characterized by various features and it forms the basis for the different searching techniques. The system was implemented in Microsoft Visual C++. The paper reports on the functional capabilities of the various search techniques and their performance.
منابع مشابه
Versatile Search of Scanned Arabic Handwriting
Searching scanned handwritten documents is a relatively unexplored frontier for documents in any language. In the general search literature retrieval methods are described as being either image-based or text-based with the corresponding algorithms being quite different. Versatile search is defined as a framework where the query can be either a textual string or an image snippet in any language ...
متن کاملA search engine for Arabic documents A search engine for Arabic documents
This paper is an attempt for indexing and searching degraded document images without recognizing the textual patterns and so to circumvent the cost and the laborious effort of OCR technology. The proposed approach deal with textual-dominant documents either handwritten or printed. From preprocessing and segmentation stages, all the connected components (CC) of the text are extracted applying a ...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملAn Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملA search engine for Arabic documents
This paper is an attempt for indexing and searching degraded document images without recognizing the textual patterns and so to circumvent the cost and the laborious effort of OCR technology. The proposed approach deal with textual-dominant documents either handwritten or printed. From preprocessing and segmentation stages, all the connected components (CC) of the text are extracted applying a ...
متن کامل